I am a PhD candidate at NYU advised by Andrew Gordon Wilson. I work on the science of deep learning and focus on understanding the generalization properties of deep neural networks using notions that relate to generalization such as model compression and loss surface analysis. Using insights about generalization, my goal is to build improved, scalable and robust deep learning models.

My PhD research has been recognized with an ICML 2022 Outstanding Paper Award for my work on Bayesian model selection and a Best Paper Award at the ICML 2024 Theoretical Foundations Workshop for my work on understanding generalization in LLMs through the lens of compression. My research is generously supported by the Microsoft Research PhD Fellowship, the Google DeepMind Fellowship, and the Meta AI Mentorship Program. I was recently distinguished as a Rising Star in EECS by MIT and a Rising Star in Machine Learning by the University of Maryland.

In summer 2024, I was a research intern at Microsoft Research where I worked with Miro Dudik and Jordan Ash to build novel methods for efficient large language model merging for mutli-task learning. In 2022-2023, I was a Visiting Researcher at Meta FAIR, where I worked with Brandon Amos to derive generalization bounds for LLMs and understand the benefits of input-dependent augmentations in image classification. In summer 2022, I worked with Bernie Wang and Richard Kurle at Amazon to understand and quantify distribution shifts in time series.

Prior to NYU, I worked with Andrea Lodi and Dominique Orban at Polytechnique Montreal to design stochastic algorithms with compelling theoretical and empirical properties for large-scale optimization. I received the Best Master’s Thesis Award for this work.

You can contact me at sl8160[at]nyu[dot]edu


Recent News

πŸ“’ December 2024: I will be a keynote speaker at the Machine Learning and Compression Workshop @ NeurIPS 2024.

πŸ“† December 2024: I’m co-organizing the Scientific Methods for Understanding Neural Networks Workshop @ NeurIPS 2024.

πŸ₯³ September 2024: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models got accepted to NeurIPS as a spotlight!

⭐ August 2024: I was selected as a Rising Star in EECS by MIT.

πŸ† July 2024: Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models won the Best Paper Award at the ICML Theoretical Foundations Workshop.

πŸ“’ June 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at ML Collective.

πŸ‘©β€πŸ’» June 2024: I started my summer internship at Microsoft Research NYC, where I will be working on large language model merging for multi-task learning.

πŸ₯³ May 2024: Non-Vacuous Generalization Bounds for Large Language Models got accepted to ICML!

πŸ“’ May 2024: I gave a talk on Non-Vacuous Generalization Bounds for Large Language Models at Cohere for AI and UIUC ML Reading group.


Selected Publications

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi*, Yilun Kuang*, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson
NeurIPS 2024
🌟 Spotlight Presentation
ICML Workshop on Theoretical Foundations of Foundation Models, 2024
πŸ† Best Paper Award
[arxiv]

Non-Vacuous Generalization Bounds for Large Language Models
Sanae Lotfi*, Marc Finzi*, Yilun Kuang*, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson
ICML 2024
[arxiv, code]

Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson
ICML 2022, JMLR 2023
πŸ† ICML Outstanding Paper Award, JMLR Best Papers Track
[arxiv, code, poster, talk, slides]

PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi*, Marc Finzi*, Sanyam Kapoor*, Andres Potapczynski*, Micah Goldblum, Andrew Gordon Wilson
NeurIPS 2022
[arxiv, code]

Dangers of Bayesian Model Averaging under Covariate Shift
Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson
NeurIPS 2021
[arxiv, code, poster]

Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling
Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson
ICML 2021
🌟 Spotlight Presentation
[arxiv, code, slides]

Stochastic First and Second Order Optimization Methods for Machine Learning
Sanae Lotfi
Master’s Thesis, Polytechnique Montreal 2020
πŸ† Best Thesis Award